Evaluation of scoring functions for protein multiple sequence alignment using structural alignments

نویسندگان

  • Sivan Yogev
  • Shlomo Moran
چکیده

The process of aligning a group of protein sequences to obtain a meaningful Multiple Sequence Alignment (MSA) is a basic tool in current bioinformatic research. The development of new MSA algorithms raises the need for an efficient way to evaluate the quality of an alignment, in order to select the best alignment among the ones produced by the available algorithms. A natural way to evaluate the quality of alignments is by the use of scoring functions, which assigns for each alignment a number reflecting its quality. Different scoring functions for MSA were proposed over the years, which raised the need for methodological ways to asses the quality of such functions. Few methods for assessing the quality of scoring functions for pairwise alignments were proposed. These methods are based on comparing alignments which are optimal for a given scoring function to structural alignments (alignments obtained through analysis of the 3 dimensional structures of related proteins). A main obstacle in using the above methods for evaluating scoring functions for alignments of k > 2 sequences is the unavailability of efficient algorithms for computing optimal alignments (for a given scoring function) of even moderate number of sequences. We propose a framework for bypassing this difficulty, which is based on computing the correlation between suboptimal alignments. An inherent issue that needs to be addressed in our method is the identification of an appropriate sample set of alignments to be used in the correlation test. We describe this problem, suggest a solution and report results using this solution. Our results indicates that for most scoring functions, the addition of appropriate gap penalties improves the quality of the function. One notable exception is COFFEE, for which the average improvement after adding gap penalties was negligent in all of our experiments. COFFEE was also the best function in the average quality for the entire benchmark tested. Notations and Abbreviations MSA – Multiple Sequence Alignment NW – Needleman-Wunch algorithm/scoring function SoP – Sum of Pairs scoring function CoG – Center of Gravity scoring function

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Profile alignment scoring functions A comparison of scoring functions for protein sequence profile alignment

Motivation: In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence-sequence methods (e.g. BLAST) and profile-sequence methods (e.g. PSIBLAST). Profile-profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTAL...

متن کامل

A comparison of scoring functions for protein sequence profile alignment

MOTIVATION In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence-sequence methods (e.g. BLAST) and profile-sequence methods (e.g. PSI-BLAST). Profile-profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTAL...

متن کامل

QUASAR - scoring and ranking of sequence-structure alignments

SUMMARY Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult pro...

متن کامل

Improving Profile-Profile Alignments via Log Average Scoring

Alignments of frequency profiles against frequency profiles have a wide scope of applications in currently used bioinformatic analysis tools ranging from multiple alignment methods based on the progressive alignment approach to detecting of structural similarities based on remote sequence homology. We present the new log average scoring approach to calculating the score to be used with alignmen...

متن کامل

Aligning Protein Sequences with Predicted Secondary Structure

Accurately aligning distant protein sequences is notoriously difficult. Since the amino acid sequence alone often does not provide enough information to obtain accurate alignments under the standard alignment scoring functions, a recent approach to improving alignment accuracy is to use additional information such as secondary structure. We make several advances in alignment of protein sequence...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006